-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: ensure reindex / getitem to select columns properly copies data for extension dtypes #51197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: ensure reindex / getitem to select columns properly copies data for extension dtypes #51197
Conversation
+1. This seems like the kind of thing that I'd usually want to call a bugfix and you'd want to call a breaking change. Since its for 2.0 i'd be fine documenting it either way. |
…lumns-extensiondtypes
…lumns-extensiondtypes
Documented as a bug fix. We could also leave it as is, and delay the behaviour change to pandas 3.0 when Copy-on-Write would be enabled. That has the benefit of not having the performance hit in the meantime of making the copy, but the drawback that we keep the wrong behaviour for a longer time (giving more people the chance to rely on its side-effects). I would personally go for the correct behaviour right now. |
…lumns-extensiondtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is a fix, should do for 2.0
I encountered this while writing more tests for Copy-on-Write. Currently, the general rule is that selecting columns with a list-like indexer using getitem gives a copy:
However, that doesn't seem to be the case when the columns we select are extension dtypes. When using
dtype="Float64"
in the above example, the originaldf
gets updated because subset isn't a copy.While I am not sure this is an explicitly documented rule (AFAIK this is de-facto behaviour, and only described as such in the discussions related to copy/view and CoW), I do think it would be expected that extension dtypes behave the same as numpy dtypes on this front.
(it also makes writing tests for copy/view behaviour harder if the behaviour doesn't only change for CoW or not, but also depending on numpy vs extension dtypes. This is where I encountered the issue)
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.